Learning with Rare Cases and Small Disjuncts

نویسنده

  • Gary M. Weiss
چکیده

Systems that learn from examples often create a disjunctive concept definition. Small disjuncts are those disjuncts which cover only a few training examples. The problem with small disjuncts is that they are more error prone than large disjuncts. This paper investigates the reasons why small disjuncts are more error prone than large disjuncts. It shows that when there are rare cases within a domain, then factors such as attribute noise, missing attributes, class noise and training set size can result in small disjuncts being more error prone than large disjuncts and in rare cases being more error prone than common cases. This paper also assesses the impact that these error prone small disjuncts and rare cases have on inductive learning (i.e., on error rate). One key conclusion is that when low levels of attribute noise are applied only to the training set (the ability to learn the correct concept is being evaluated), rare cases within a domain are primarily responsible for making learning difficult.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Concept Learning and the Problem of Small Disjuncts March

Ideally de nitions induced from examples should consist of all and only disjuncts that are meaningful e g as measured by a statistical signi cance test and have a low error rate Existing inductive systems create de nitions that are ideal with regard to large disjuncts but far from ideal with regard to small disjuncts where a small large disjunct is one that correctly classi es few many training...

متن کامل

A Quantitative Study of Small Disjuncts

Systems that learn from examples often express the learned concept in the form of a disjunctive description. Disjuncts that correctly classify few training examples are known as small disjuncts and are interesting to machine learning researchers because they have a much higher error rate than large disjuncts. Previous research has investigated this phenomenon by performing ad hoc analyses of a ...

متن کامل

Concept Learning and the Problem of Small Disjuncts

Ideally, definitions induced from examples should consist of al l , and only, disjuncts that are meaningful (e.g., as measured by a statistical significance test) and have a low error rate. Exist ing inductive systems create definitions that are ideal wi th regard to large disjuncts, but far from ideal wi th regard to small disjuncts, where a small (large) disjunct is one that correctly classif...

متن کامل

The Impact of Small Disjuncts on Classifier Learning

Many classifier induction systems express the induced classifier in terms of a disjunctive description. Small disjuncts are those disjuncts that classify few training examples. These disjuncts are interesting because they are known to have a much higher error rate than large disjuncts and are responsible for many, if not most, of all classification errors. Previous research has investigated thi...

متن کامل

Learning with Class Skews and Small Disjuncts

One of the main objectives of a Machine Learning – ML – system is to induce a classifier that minimizes classification errors. Two relevant topics in ML are the understanding of which domain characteristics and inducer limitations might cause an increase in misclassification. In this sense, this work analyzes two important issues that might influence the performance of ML systems: class imbalan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995